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CHANGE POINT DETECTION IN NETWORK MODELS: 
PREFERENTIAL ATTACHMENT AND LONG RANGE DEPENDENCE 

SHANKAR BHAMIDI, JIMMY JIN, AND ANDREW NOBEL 


Abstract. Inspired by empirical data on real world complex networks, the last few years 
have seen an explosion in proposed generative models to understand and explain observed 
properties of real world networks, including power law degree distribution and “small world” 
distance scaling. In this context, a natural question is the phenomenon of change point, un¬ 
derstanding how abrupt changes in parameters driving the network model change structural 
properties of the network. We study this phenomenon in one popular class of dynamically 
evolving networks: preferential attachment models. We derive asymptotic properties of var¬ 
ious functionals of the network including the degree distribution as well as maximal degree 
asymptotics, in essence showing that the change point does effect the degree distribution but 
does not change the degree exponent. This provides further evidence for long range depen¬ 
dence and sensitive dependence of the evolution of the process on the initial evolution of the 
process in such self-reinforced systems. We then propose an estimator for the change point 
and prove consistency properties of this estimator. The methodology developed highlights 
the effect of the non-ergodic nature of the evolution of the network on classical change point 
estimators. 


1. Introduction 


Motivated by the availability of data on many real world systems, the last few years have 
witnessed an explosion in both methodological as well as theoretical development of various 
complex network models. The aim of these models is to explain structural features observed 
in the data (e.g. power law degree distribution or “small world” connectivity) as well as 
understand and predict the behavior of dynamic processes on these networks including disease 
contact networks, search algorithms, random walks, evolution and dissolution of communities 
and a wide array of related processes [2j[lT}[l9|[23j[26j|42}|43j[58]. One sub-field of this vast 
field which has been particularly active is temporal or time varying networks. See the recent 
surveys (9, 31 and the references therein for both methodological developments as well as 


applications in a wide array of fields ranging from social networks and online communication, 
cell biology including temporal properties of protein interaction networks, and infrastructure 
systems such as the power grid. 

Many such proposed models are driven by a collection of parameters that describe the 
evolution of the network. A natural question in this context is the study of change points , 
the effect of abrupt changes in parameters driving the evolution of the network, on structural 
properties of the network. To fix ideas, first consider the simplest version of the classical 
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(offline) change point detection in the context of iid data described as follows. Fix two 
distribution functions F and G (unknown but different) and a parameter 7 E (0,1). Consider 
a stream of data { X{ : 1 i ^ n} with distribution: for i ^ |_ n Yj are with distribution 
F whilst for i > [^ 7 ], X % are iid with distribution G (and independent of the initial segment). 
Based on the observed data, {JQ : 1 ^ i n}, the aim is then to estimate the change point 7 
using estimators that are consistent as the sample size n —> 00 . 

In this spirit, this paper has two main goals: 

(a) We start with a variation of the standard preferential attachment model of evolving net¬ 
works that incorporates a change point. This conceptually simple model allows for a sim¬ 
ple interpretation of the effect of the change point on network dynamics. We rigorously 
study the effect of this change point on structural properties of the network including the 
scale-free or heavy tailed nature of the limiting degree distribution as well as asymptotics 
for the maximal degrees. 

(b) We then propose and study consistency properties of offline estimation procedures to 
detect the location of this change point from observed data. In particular this allows one 
to gain insight into the effect of the non-stationary nature of the evolution of the network 
model on various known heuristics for estimation in the iid setting. 


1.1. Organization of the paper. Both change point detection as well as preferential at¬ 
tachment models have witnessed enormous amount of work over the last few decades. We 
defer a fuller discussion of these two fields, their relevance to this paper as well as related work 
to Section [3} We start in Section [172] by defining the model. In Section L3 we setup notation 
required for the main results. Section [2] contains our main results, starting with Section 2.1 


that describes asymptotics for functionals of the networks including the degree distribution 
as well as maximal degrees as the network size n —> 00 . Section 2.2 formulates estimators to 
find the change point and formulates consistency properties for these estimators. Proofs for 
asymptotics of network functionals can be found in Section |4j Section [5] develops a functional 
central limit theorem for a specific functional of the network. Section [ 6 ] then uses this CLT 
to prove consistency of the proposed estimator. 


1.2. Model formulation. We start by describing the original model of preferential attach¬ 
ment with no change point (4 54, 60 . There are many variants of this model. Throughout 
the paper we will consider the simplest case where the network at each stage is a tree. The 
methodology can be generalized to the general network setup. Start with a single vertex at 
time m = 1 (this vertex will be referred to as the root or the original progenitor of the process 
and denoted by p ). Fix a parameter a ^ 0. At each discrete time point 1 < m ^ n a new 
vertex enters the system with a single edge which it will then connect to a pre-existing vertex. 
The vertex connects to a pre-existing vertex v with probability proportional to the current 
degree of v +a. Let T m denote the graph at time m and {T m : 1 ^ m ^ n} be the entire graph 
valued process. Note that since each new vertex has one edge which it uses to connect to the 
current graph, 7 m f° r any m is a tree (which we view as rooted at p). Thus for m > 1, the 
degree of every vertex is at least 1. Further calling the pre-existing vertex that a new vertex 
attaches to as the parent of this vertex, one can view this process as generating a directed 
tree with edges pointed from parents to children. 

We will soon switch over to a continuous time version of the process where it is convenient 
to work with a slight variant of the above process. Note that for a (directed rooted) tree, the 
degree of every vertex other than the root is 1+ out-degree of the vertex. For the root, the 
degree and the out-degree coincide. Now fix a single vertex at time m = 1 and a parameter 
a > 0. The variant considered in this paper is as follows: at each stage m > 1 a new vertex 
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enters the system and connects to a pre-existing vertex v E T m -i with probability proportional 
to 1 + a+ out-degree of v in T m -\. This variant results in all the same asymptotic properties 
as the original model and is slightly easier to deal with rigorously. 

This model has been studied extensively and in particular it is known 1121 that the degree 
distribution converges in the large network limit. Precisely, for fixed k ^ 1, let N n (k ) denote 
the number of vertices with degree k in T n - Then, 


N n (k) 


where p a (k) := (2 + cc);^ ^ 


n 


Pa (k') t 


n‘i 2 (j+2o) 


(i.i) 


Here for k = 1, we use the notation n)=i = 1- Write D a for a random variable with the 
above distribution. It is easy to check that there exists a constant c > 0 such that 

c 


F(D a ^ k ) 


k a+3 ’ 


as k 


oo. 


( 1 . 2 ) 


Further, arranging the degrees in T n in decreasing order as M n ( 1) ^ M n (2 ) ^ • M n (n), it 

is known [6,41) that for any fixed k ^ 1, there exists a non-degenerate probability distribution 


v% on such that 


, rH 2 +“) / 


(1.3) 


1.2.1. Model with change point: Now fix two attachment parameters a,/3 > 0, a change 
point parameter 7 E (0,1), and a system size n > 1. The model does preferential attachment 
as before, but now the attachment dynamics changes after time [nj J namely 

(a) For time 0 < m ^ L n 7_l> the new vertex entering the system at time m connects to 
pre-existing vertices with probability proportional to their current out-degree +1 + a. 

(b) For time [iryj < t ^ n, the new vertex connects to pre-existing vertices with probability 
proportional to their current out-degree +1 + f3. 

Let 6 = (a, /3, 7 ) be the driving set of parameters of the model. We will let Te.m de¬ 
note the rooted tree at time m and {7 'e,m : 1 ^ m ^ n} for the entire graph valued process. 
When the context is clear, for ease of notation we suppress the dependence on 6 and write 
{I'm : 1 ^ m ^ n}. This model is the main object of interest for the rest of the paper. 


1.3. Preliminary notation. To state our main results we will need to define some addi¬ 
tional objects. Recall the parameter set 6 := (a,/3, 7 ) used to construct the model. Let 
{E a {k) : k ^ 1} be a sequence of independent exponential random variables such that for 
each fixed k ^ 1, E a {k ) has rate k + a. View the above as the inter-arrival times of a point 
process V a on M + . More precisely write, 

L a (m) = E a { 1 ) -|- 1 - E a (m), m ^ 1 . 

Consider the point process 

V a := (L a (l),L a (2),...). (1.4) 

Analogously define {Ep(k) : k ^ 1}, {Lp{k) : k ^ 1} and the corresponding point process Vp. 
For fixed t ^ 0, write N a (t ) := V a [0, t] for the number of points in V a which fall in the interval 
[0 ,t]. 

We will need variants of the above point process. Fix j ^ 1 and a. > 0. Let V 3 a be the 
point process where we use the sequence of points { E a (m ) : m ^ j} to construct the point 
process so that the first point arrives after an exponential rate j + a amount of time, the 
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second point arrives at rate j + 1 + a after the first point and so forth. As before let Na(-) 
be the corresponding counting process and note that A^(-) = N a (-). 

Define the constant 

»=^ io 4 <l5) 


On the interval [0,a], define the 
mulative distribution function 

G a (s) = 


“truncated” exponential distribution described via the cu- 
1 - exp (—(2 + f3)s) 


s G [0, a]. 


( 1 . 6 ) 


1 - exp (—(2 + ft) a) ’ 

Write Age for a random variable with distribution G a (the reason for this terminology will 
become clear in the proof). Generate a counting process Np(-) as above (independent of Age) 
and let X^c = X@ [( 1 , Age], namely the number of points that occur before the random time 
Age. Here AC is a mnemonic for “after change point”. 

We are now in a position to define the limiting degree distribution. Consider the following 
integer valued random variable Dg: 

(a) With probability 1 — 7 , Dg = 1 + X/\q. 

(b) With probability 7 , Dg = D a + N® a [ 0, a] where D a is a random variable with distribution 
as in (1.1), namely the limiting degree distribution without change point. More precisely, 


generate D a with distribution as in (1.1). Conditional on D a , generate the point process 
Np and count the number of points in the interval [ 0 , a] and add this to the original 
random variable D a . 

Write pe = ( pg(k) : k ^ 1) for the probability mass function of the above random variable 
namely 

pg(k) =P(Dg = k), k^l. (1.7) 

2. Results 

Let us now describe our main results. We state results about the asymptotic degree distri- 
We formulate statistical procedures to estimate the change point and 


bution in Section 2.1 


the associated consistency results in Section 2.2 


2.1. Asymptotics for the degree distribution. Fix 6 G x (0,1). For fixed k ^ 1 let 
N n (k) denote the number of vertices with degree k in the random tree T n constructed in the 
change point model as in Section 1.2.1 The random variable Dg in the following result is as 


defined in (1.7) 


Theorem 2.1. Fix k > 1. As n 


» 00 the degree distribution satisfies, 
N n (k) 


n 


Further for a 7 ^ fi and 7 G (0,1), pg / p, 
that for all k 1 

< T(D 0 k) ^ 


P (Dg = k), 

However there exist constants 0 < c < c' such 


( 2 . 1 ) 


k a+2 0 ^ ' v x k a+2 ' 

Remark 1. This theorem says that one does feel the effect of the change point in the 


empirical degree distribution if a 7 ^ f3 and 7 G (0,1), however comparing (2.1) with (1.2), for 


any fixed 7 G (0,1), this does not change the tail behavior. This is a little surprising as one 
might assume, especially for 7 close to zero and f3 < a (where the no change point dynamics 
with f3 instead of a results in a degree distribution with a heavier tail), the tail of the degree 
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distribution might scale like k namely the dynamics of attachment driven by f3 should 

kick in. However this is not the case. 

Remark 2. The techniques developed in this paper easily extend to the setting of multiple 


change points. We describe these extensions in Theorem 3.1 


The next result deals with maximal degree asymptotics. As before arrange the degrees in 
T n in decreasing order as M n ( 1 ) ^ M n { 2 ) ^ • M n (n). 

Theorem 2.2. Fix k ^ 1 and consider the k maximal degrees ( M n (J ) : 1 ^ j ^ k). Then the 
sequence o/M+ valued random variables defined by setting 


M n (k) := 


K n (2+a) j 


n ^ 1 , 


is tight and bounded away from zero. 


Remark 3. Comparing the scaling of the maximal degrees above to the setting of no change 


point as described in (1.3), one sees that the maximal degrees do not feel the effect of the 
change point, at least in terms of their order of magnitude. We further conjecture that 
{M n (fc) : n ^ 1} converge weakly to a non-degenerate distribution on . We have not pur¬ 
sued this further in this paper. 


2.2. Change point detection. The aim of this Section is to formulate a non-parametric 
estimator for the change point based on observations of the network and state consistency 
results for this estimator. We first need some notation. For fixed k ^ 1 let N n (k,m ) denote 
the number of vertices with degree k in the tree 7 m . Rescaling time by n, for 0 ^ t ^ 1, let 
N n (k,t ) = N n (k,nt). Finally define 

P n (k,t ) = Nn ^ t ,t \ 0 <t<l, ( 2 . 2 ) 

namely the proportion of vertices with degree k at time nt. The k = 1 case corresponds to 
the number of leaves. To ease notation in the displays below, write p n (l,t) = pf. Now define 
the continuous function, 


2+a 


(oo) 

Pt = 


if 0 ^ t ^ 7 


!S§ (j - (?) ’« ) + ? (is) (?)if 7 «(« 1 . 


(2.3) 


We will prove in Section 5.1 that for each fixed 0 < t ^ 1, p\°° will represent the limiting 
proportion of leaves in T n t■ To simplify notation in the sequel, define the function <5 : M + —> 
[0, 1] by the prescription 

1 -I -II 

5 U := u > 0. (2.4) 

2 + u v ’ 

Note that p ( t ca) = p^° ) for t ^ 7 . Now define the positive function {<tm(^) : 0 ^ t ^ 1} via the 
formulae 

t 25a [^^(l ~ <Wt° ) )] if 0 ^ t ^ 7 , 

25 ( 2 - 5 ) 

k T 2Sa (^) * k'PT^ 1 - fyPt* 0 )* if 7 < * < 1- 


4W : = < 
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Log-log degree distribution, network vs. theory (n = 500k) 
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Figure 2.1. Log log plot showing the limiting degree distribution (red) and 
simulated network degree distribution (blue) with network size n = 500, 000 
and a corresponding sample of the same size from the predicted degree distri¬ 
bution. The model parameters are taken as a = 6, /3 = 1 and the change point 
7 = .5. We discuss other values of the parameters in Section [3j 


For later use define the functions 
cr\t) : = 

and 


[SaP^il - (Wr 1 )] if o ^ t ^ 7, 
S 0P t\l-6^n, if 7 < t ^ 1 ) 


fi{t) := < 


0 <f ^ 7 


&Bl 5p ~ Sa + ^ i 

' t s p+ 1 7 < t < 1 


Dehne the diffusion { M(t ) : 0 ^ t ^ 1} via the prescription 

dM(t ) = (JM(t)dB(t ), 0 ^ t ^ 1. 


( 2 . 6 ) 


(2.7) 


( 2 . 8 ) 


Here {B(u) : u ^ 0} is standard Brownian motion on M+. Thus M is essentially a determin¬ 
istic time change of B(-) namely 

(j)(t) = f a 2 M {s)ds , (M(f) : 0 ^ t ^ 1} = {H(</>(t)) : 0 < t ^ 1} . 

Jo 


(2.9) 
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In particular M(-) is a Gaussian process on [0,1]. Finally define the functions 

if 0 < t ^ 7 , 


i 

t S <* 


g(t) := < 


op—Oct 

t s u 


if 7 < t ^ 1 . 


Define the process 

G(t) = g(t)M(t), 0<*<1. 

By Ito’s formula G(-) solves the SDE 

dG(t ) = n(t)M(t)dt + a(t)dB(t), 


( 2 . 10 ) 


( 2 . 11 ) 


( 2 . 12 ) 


where <r(-) and /r(-) are as in (2.6) and (2.7) respectively. Then we have the following result. 


Theorem 2.3. Consider the process of re-centered and normalized number of leaves 

N n (l,t) ~ ntp ( f x) 


G n (t) : = 


n 


0 < i < 1, 


(2.13) 


with linear interpolation between time points. Then G n —> G where G is the diffusion defined 
in ( 2 . 12 ) and convergence is with respect weak convergence on D([ 0 , 1 ]) with respect to the usual 
Skorohod metric. 

For the rest of this section, let p n ( m ) denote the proportion of leaves (degree one vertices) 
in I'm- Fix e > 0. We will define two functions on the interval [e, 1]. Let 


nt 


t.h {n) = 


n(t — e) 

' ' m — 


Pnijn) 


£ ^ f SC 1. 


(2.14) 


Let 


1 

h( t n) = e < t ^ L (2.15) 

m=nt +1 

In words, represents the average proportion of leaves in the process between time ne and 
nt while h ( f' > represents the same quantity but after time nt. Define the function 

D n {t):=(l-t)\ t h^ -h^l t€[e, 1]. (2.16) 

Write A4 n for the collection of points t for which the corresponding function value D n (t) is 
within log n/y/n of the maximum of the function. Precisely, let D* = max te [ £ l ] D n (t ) and let 

log n 


M n := { t G [e, 1] : | D n (t) - D* n \ ^ 


(2.17) 


Finally let 

7 n := rnaxjt : t e M. n } . (2.18) 

The functionals and 7 n all depend on e but we suppress this dependence to ease 

exposition below. 

P 

Theorem 2.4. Assume that the change point 7 > e. Then the estimator 7 n —> 7 and in 
fact 

l7n-7l =Op( 1 ^) (2.19) 


Thus 7 n is a consistent estimator for the change point 7 . 
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Remark 4. The e-truncation away from zero is a technical compensation for the factor t in 
the denominator in (2.14). Technically one should be able to choose a sequence e n \,0 slowly 
enough such that the above result (modified using this sequence e n instead of the fixed e) is 
true. This would make the assumption of 7 > e irrelevant in the statement of the Theorem. 


Remark 5. The threshold logn 
threshold to be uj„. /\fn where u. 


n in (2.17) was arbitrary in the sense that if we chose the 


00 arbitrarily slowly then the corresponding estimator 


would satisfy (2.19) with bound uj n /y/n. 


Remark 6. See Figure 2.2 for a figure based on simulations for the function D n (t ) with e 
taken to be zero. 


n=200000, alpha=6, beta=1, gamma=0.5, K=1 



time 


Figure 2.2. 
parameters a 


The function D n {t) with network size n = 200,000, and model 
= 6 , /3 = 1 and the change point 7 = .5 as in Figure 


2.1 


3. Discussion 

We now discuss the relevance of our results, their connections to existing literature and 
possible extensions of the results in this paper. 

3.1. Multiple change points. The proof techniques carry over in a straightforward fashion 
to the general setting of multiple change points. Fix time points 0 < 71 < 72 < • • • 7 a, < 1 
and parameters a, (A)i 7 fc- As before write 6 = (a, (/3j)i7i7fc; ( 7 «)i 7 * 7 fc) f° r the parameter set. 
Consider the random tree T n = Te, n where 

(i) In the interval {1 < t ^ 71 n}, vertices use the attachment scheme driven by a (namely 
each new vertex attaches to an existing vertex with probability proportional to out- 
degree +1 + a). 

(ii) In subsequent intervals {'Jjn < t ^ 7 j + in} where 1 ^ j ^ k — 1, vertices perform the 
attachment scheme driven by the parameter fij. Here we use the convention 70 = 

0,7 fc+ i = 1. 
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As in Section 1.3 define the point processes V a ,Vp t and for fixed j ^ 1 , the point processes 
Vo -.Vi.. To simplify notation, for any t ^ 0 and point process V, set V[0,t] for the number 
of points in the interval [0 ,t]. Define the constants 


1 




log 


7j+i 


(3.1) 


2 + (5j 7 j 

Note that tv = (to, vti, ..., TTk) is a probability mass function. Write Epoch for a random 
variable with distribution tv (i.e. P( Epoch = i) = ir t for 0 ^ i ^ k). Using the constants 
{a* : 1 ^ i ^ k} let G ai denote corresponding truncated exponential distributions as in ( 1 . 6 ) 
and let Age,j denote a random variable with distribution G ai ■ Now construct the random 
variable TimeAlive as follows: 


(a) Generate a collection of independent random variables Epoch and {Agej : 1 ^ i ^ k} with 
distributions specified as above. 

(b) Conditional on Epoch = i, let 


k 

TimeAlive = Agej + ^ aj, 

j=i +1 


where again by convention, if Epoch = 0, Age 0 = 0 and so TimeAlive = l a i- 
Construct a positive integer valued random variable Dg as follows: 

(i) Generate Epoch ~ tv as above and the corresponding random variable TimeAlive. 

(ii) If Epoch takes a non-zero value 1 ^ i ^ k, conditional on Epoch = i , generate the 
switching point process P* on the interval [0, TimeAlive] as follows: 

(a) Initialization: In the interval [0, Agej], start with P* = Vg t . Suppose by time Agej, 
P*[0, Agej] = k. Now generate a point process P( + ' and let P*[0, Age, : + aj + 1 ] = 
V* [0, Agej] + Vp. +1 [0, a i+ i]. 

(b) Recursion: For each subsequent interval [ 07 , 07 + 1 ] with j > i , conditional on 
P*[0, Age,j + a,i + 1 + ■ ■ ■ dj\ = kj , generate the point process 'P++' . Define 

P*[0, Agej + a^+i + • • • aj+i] = P*[ 0 , Agej + a *+1 + ■ ■ ■ dj] + Pg^ [0> a i+i]- 


Iterate until the last interval resulting in P*[0, TimeAlive]. 

Now define Dg = 1 + 7^*[0, TimeAlive], 

(iii) If Epoch = 0, so that TimeAlive = a\ + •••a/ c , generate a random variable D a with 
distribution p a as in (1.1). Conditional on D a , generate P* in the interval [0, ai] with 

(ii). In this case, define Dg = 


distribution P^“ and then sequentially proceed as in 
D a + P*[0, TimeAlive], 


Write pg(-) for the pmf of Dg. As before for k ^ 1, let N n (k) denote the number of vertices 
with degree k in V n - Then we have the following result. 


Theorem 3.1. As n — > 00 we have 


N n (k) 
n 



pg{k). 


Further there exist constants 0 < c < d such that for all k ^ 1 


c 

k a+2 


^ P (Dg ^ k ) ^ 


k a+2 ’ 


(3.2) 
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3.2. Change point detection: This problem has a vast history owing to its obvious im¬ 
portance in applications in fields ranging from quality control and reliability of industrial 
processes, in particular quick detection of process failure in production, to fields such as 
signal processing (e.g. biomedical data including neuronal spike data and seismic data), au¬ 
tomatic segmentation of signals into stationary segments via identification of change points 
etc. While it is impossible to provide a representative sampling of this area, we direct the 
interested reader to |5[|T3,16|[l7,21,5lff53] and the references therein for an overview of just 
some of the statistical methodology as well as applications. 

In this context, recall the motivating example of an independent stream of data 
{Xi : 1 ^ i ^ n} with a change point in the distribution from F to G at time ny described 
in Section [lj Let and H { t n) denote the empirical distribution of the data before and 

after t namely 


tH (n) 


1 

nt 


J2 6x i’ 


H T := 


1 

n( 1 - t) S §Xi ’ 

' ' i=nt +1 


0 < t < 1 . 


Now define 

D n {t) :=t(l-t)dist( t H (n \H^), 

where dist is any standard notion of distance between probability distributions on M e.g. 
Kolmogarov-Smirnov supremum norm or total variation distance. Finally define 


7 n = ar g max D n (t). 
te[o,i] 


Then in 16 


it is shown that 7 n is a consistent estimator of 7 . This was partial motivation 
for our estimator. Note the “asymmetry” as a function of t between the “classical” context 
and the model with change point highlighting the non-ergodic nature of the evolution of the 
model after the change point. 

A second point to note is that we use information on leaf densities in the large network 
n —> 00 limit. As in [48], one should be able to build on the functional CLT for leaf counts to 

establish a joint functional CLT for jiV n (M) : 1 ^ k ^ K,0 ^ t ^ l} after proper normal¬ 
ization and re-centering for any fixed K ^ 1. Modifying the estimator in Section [ 6 ] should 
enable one to get estimators that perform better for finite n. 


3.3. Temporal networks and change points: As described in the introduction, the avail¬ 
ability of data on real world networks over the last few years has motivated development of 
mathematical methodology in a wide array of fields including computer science, statistical 
physics and probability to make sense of this data. With regards to problems philosophically 
similar to change point detection, analogous to segmentation and boundary detection [37,57], 
there has been a significant amount of work detecting anomalous subgraphs and motifs within 
networks, see e.g. |1,28,,44|, for a wide-ranging survey see jT 8 j. This also includes anomalous 
edge detection via link prediction algorithms [32 j. With regards to detection of change points 
in temporal (time-varying) network data and in particular structural properties of these ob¬ 
jects see [55] that posits an algorithmic approach based on minimum description length to 
understand evolving communities in social networks. For statistically grounded approaches 

See [46 for an overview of the state of the art regarding change 


see 


24,30,39,40,47,50,61 


point detection in networks and develops new statistical methodology using a generalized hi¬ 
erarchical random graph model (GHRG) and various likelihood ratio based test statistics to 
detect existence of change points via online detection algorithms. This paper also studies the 
performance of these algorithms on simulated as well as real data including the MIT proximity 
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data |27| and the ENRON email network data. See 138,59 for rigorous analysis of models 


where each time slice of the model is assumed to be an Erdos-Renyi random graph. 


3.4. Preferential attachment: This model has become one of the standard workhorses in 
the complex networks community, in particular for its ability to give a generative reason for 
the power law/heavy tailed degree distribution observed in an array of real world systems. 
At this point it is impossible to compile a representative list of references, we will try to give 
an overview, restricting ourselves as far as possible to papers close in spirit to this paper; see 
1561 where it was introduced in the combinatorics community, |4j for bringing this model to 
the attention of the networks community, [43],j23| for survey level treatments of a wide array 
of models, 1121 for the first rigorous results on the asymptotic degree distribution, and |20], 
and 


49 


26 and the references therein for more general models and results. 


We are not aware of other analysis of the effect of change point in structural properties of 
such network models. There has been a lot of recent interest in understanding and detecting 
the “initial seed” [14, 15,22). Here one starts with an initial “seed graph” at time m = 0 and 
then performs preferential attachment started from that seed. The aim is then to estimate this 
initial seed based on an observation of the network at some large time n. While different from 
this paper, this body of work again emphasizes the sensitive dependence on initial conditions 
for such network models. 


3.5. Proof techniques: A number of techniques have been developed to rigorously analyze 


functionals such as asymptotic degree distributions (see 26, 58 for nice pedagogical treat¬ 
ment). The standard technique involves writing down recursions for the expected degree 
distribution E(A r n (fc)) using the prescribed dynamics of the process, to show that these ex¬ 
pectations (normalized by n) converge in the limit and then showing that the deviations 
\N n (k) — E(IV„.(A;))| are small via concentration inequalities. 

In this paper, for understanding structural properties we use a different technique, essen¬ 
tially embedding the discrete time model in a corresponding “continuous time” branching 
process {BP^(f) : t ^ 0} (based on the Athreya-Karlin embedding of urn processes §)• This 
explains the various point processes that arise in the description of the limiting degree dis¬ 
tribution. While mathematically more involved, this technique gives more insight into the 
results as it elucidates the natural time scale of the process. In various other settings this 
technique has resulted in the study of much more general functionals of the process such as 
the spectral distribution of the adjacency matrix |6] and has been used to derive asymptotic 
results in “non-local” preferential attachment models [7j. In this paper the technique also 
allows one to intuitively understand why the degree exponent does not change. We advise 
the reader to come back to the text below after going through the proofs but let us explain 
the basic intuition here. In the continuous time version, the process grows exponentially and 
in particular takes time r 7 „ ~ ^^-Hogyn + Op(l) to get to size rry. At this time there is 
a change in the evolution where each vertex adopts attachment dynamics driven by the pa¬ 
rameter (3. However owing to the exponential growth rate, the time for the process to get to 
size n is r n ~ r 7n + a where a is as in (1.5). Thus the process does not have enough time 


for the dynamics with attachment parameter (3 to change the degree exponent (since we only 
have to wait an 0(1) extra units of time to get to system size n from yn). These ideas are 
made mathematically rigorous in the next few sections. For the interested reader, much of 
the foundational work on continuous time branching processes relevant for this paper can be 
found in 33-351. 
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3.6. Empirical dependence of the convergence on parameter values: Recall that 


the Gaussian process defined in (2.12) underlying the main consistency result Theorem 2.4 


depends on 6 = (a,/3,j). One consequence of this dependence is that when the parameter 
values a and /3 are close, the change point becomes harder to detect in the sense that larger 
n is required to get good estimates. This is most easily seen in terms of the fluctations of the 
proportion of leaves in the graph. 


alpha = 6, beta = 1, gamma = 0.5 



0.00 0.25 0.50 0.75 1.00 

Time 


Figure 3.1. Empirical proportion of leaves in a simulation with n = 
200, 000, a = 6 , /3 = 1,7 = 0.5. The red line represents the theoretical predic¬ 


tions in (2.3). 


In both Figures 3.1 and 3.2 
decreases, to f3 = 1 in 3.1 and (3 = 5 in 3.2 


the preferential attachment process starts with a = 6 and 
Furthermore the predicted behavior (red line) 


is almost the same: the proportion of leaves is constant up to the change point 7 = 0.5 and 
then increases, consistent with a decrease in the attachment parameter. 

Despite the sizes of the final graphs in both simulations being n = 200,000 vertices, at 
first glance the fluctuations appear much greater in the latter case. On closer examination 
however, this is simply an illusion of the axes. In essence, when the shift in parameters is 
smaller, the change in the proportion of leaves pre- and post -7 is smaller compared to the 
natural fluctuations in the proportion of leaves which is of order \/n (Theorem 2.3). Therefore 


any difference is more difficult to detect for same n. This is not surprising, but worth noting 
in practice. 


4. Proofs 


As described in Section 3.5, the main conceptual idea is a continuous time embedding of 
the discrete time process. We start in Section 4.1 by describing this embedding and deriving 


simple properties. Then in Section[4.2|we prove Theorem 2.1 Section 4.3 proves the assertion 


that the degree exponent does not change. Section 4.4 analyzes asymptotics for the maximal 
degrees. Section [5] contains an in-depth analysis of the density of leaves and proves Theorem 
|2.3| Section [ 6 ] then uses this Theorem to prove the consistency of the estimator namely 
Theorem 12.41 
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alpha = 6, beta = 5, gamma = 0.5 


<5 0.54 - 
> 



Q_ 

O 

Ol 


0.52- 

i i i i 


0.00 0.25 0.50 0.75 1.00 

Time 


Figure 3.2. Empirical proportion of leaves in a simulation with n = 
200, 000, a = 6 , f3 = 5 ,7 = 0.5. The red line represents the theoretical predic¬ 
tions in (2.3). 


4.1. Preliminaries. We start with the following definition. To ease notation, for the rest of 
the paper we use 7 n instead of . 


Definition 4.1 (Continuous time branching process). Fix a > 0. We let {BP Q (t) : t ^ 0} be 
a continuous time branching process driven by the point process V a defined in (1.4). Precisely: 


(a) At time t = 0 we start with one individual called the root p with an offspring point process 

with distribution Va = V a . The times of this point process represent times of birth of new 
offspring of p. 

(b) Every new vertex v that is born into the system is given its own offspring point process 
V]^ = V a , independent across vertices. 


Label vertices using integer labels according to the order in which they enter BP a so that 
the root is labelled as 1, the next vertex to be born labeled by 2 and so on. For fixed t ^ 0, 
we will view BP a (f) as a (random) labelled tree representing the genealogical relationships 
between all individuals in the population present at time t. See Figures |4.1| and 42 Write 
jBP Q (f)| for the number of individuals in the tree by time t. Fix m 1 and define the stopping 
time 


:= inf {t : | BP a (f)| = m} . 


(4.1) 


Since there are no deaths and each individual reproduces at rate at least 1 + a, the stopping 
times T m < 00 a.s. for all m ^ 1. Now consider the original preferential attachment model 
where there is no change point. Using properties of the exponential distribution, the following 
Lemma is easy to check and is just a special case of the famous Athreya-Karlin embedding 


i- 
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♦ 


P * 


2 3 5 

|—•- + 

4 


♦ 


♦ 


♦ 


♦ 


6 

-•-■- 


time - 

T, X 2 X 3 T 4 t 15 


Figure 4.1. The process 
BPq,(-) in continuous time 
starting from the root p and 
stopped at T 15 . 


P (1) 



Figure 4.2. The correspond¬ 
ing discrete tree containing only 
the genealogical information of 
vertices in BP q (ti 5 ). 


Lemma 4.2. Viewed as random rooted trees on vertex set [n] one has BP Q ,(r n ) = T n . In fact 
the two processes of growing random trees have the same distribution namely 

{BP Q (r n ) : n ^ 1} = {T n : n ^ 1} . 

To construct the variant T n where one has a change point, we run BP Q (-) till time r 7n (when 
the original process reaches size 771 ) and then every vertex changes the way it reproduces. 
More precisely, after this stopping time, an individual with k children would have reproduced 
at rate k + 1 + a in the original model but in the change point model this vertex reproduces 
at rate k + 1 + /3 and uses the parameter /3 instead of a for each subsequent offspring times. 
Each new vertex v produced after time r 7n reproduces according to an independent copy 
of the point process Vg. Call the resulting process BPg(-) and run the process till time r n 
when the continuous time process has n individuals. Analogous to (4.1), define the collection 
of stopping times {r m : 1 ^ m ^ n} by replacing BP Q with BP#. The following is a simple 
extension of the previous Lemma. 


Lemma 4.3. Recall the family of random trees {To^ m : 1 ^ m ^ n} generated using the change 
point preferential attachment model in Section [l.2.1| . Then, 

{BPg(r m ) : 1 ^ m ^ n} = {7e, m : 1 ^ m ^ n} . 

Remark 7. Note that the processes {7e, m : 1 ^ m ^ n} when one has a change point are not 
nested in a nice manner as growing trees for different values of n. Compare this with the 
original model (without change point) where we can view the entire sequence {T n : n ^ 1 } as 
an increasing family of random trees. In the above construction it will be convenient to couple 
the processes across different n by using a single common branching process BP Q to generate 
the tree before the change point r 7n and then let the process evolve independently after the 
change point for different n using the prescribed dynamics modulated by the attachment 
parameter /?. Further it will be convenient to allow the process BPq to continue to grow after 
time r n as opposed to stopping it exactly at time r n . 


For future reference, for each vertex v, we will use T v for the time of birth of this vertex 
into the system. For fixed time t and a vertex v born before time t (namely T v ^ t), we write 






















CHANGE POINT DETECTION IN NETWORKS 


15 


d v (t) for the number of children of this vertex by time t. Note that for all v p E BP^t), 
the full degree of v by time t is d v (t) + 1. 

We will need some simple stochastic calculus calculations below to derive Martingales 
related to processes of interest. Given a process {Z(t.) : t 0} adapted to a filtration 
{V(t) : t ^ 0}, we write ~E(d,Z(t)\F(t)) = a(t)dt for an adapted process a(-) if Z(t) — Jq a(s)ds 
is a (local) martingale. Similarly write Vax(dZ(t)\J-(t)) = b(t)dt if the process 

V(t) := ^Z(t) — J a(s)ds^ — J b(s)ds , t ^ 0, 

is a local martingale. 

Now recall that BP a (r 7n ) is the random tree before the change point. These random trees 
are distributed as the original preferential attachment model without change point using 
attachment dynamics with parameter a. Using (1.1) and recalling that N n (k, yn) denotes the 

► p a (k), as n -» 00 where p a (-) 


number of vertices with degree k results in the following. 


Lemma 4.4. For each fixed k 1 we have N n (k,'yn)/'yn - 
is the probability mass function in (1.1). 

Recall that the branching process BP Q is driven by the offspring point process V a 
V a (t) ■= V a [(), t] is the number of points in [0,f]. Define the process 


and 


M a (t) := e V a (t) - (1 + a)(l - e ), f^O 


(4.2) 


Lemma 4.5. The process {M a (t ) : t 0} is a martingale with respect to the natural filtration 
of V a ■ In particular 

E(P a (t)) = (l + a)(e f - 1) (4.3) 

Proof: Write {^(t) : t ^ 0} for the natural filtration of the process. It is enough to show for 
all t 0, E(dM a (t)| iF(t)) = 0. By construction 

E(d'P Q (t)|J : '(f)) = (1 + a + V a (t))dt. 

Further 

E(dM a (t)|J r (t)) = e“ t E(dP a (i)|J 7 (t)) - e^Vaiffidt + (1 + ufi^dt. 

Elementary algebra completes the proof. The final assertion regarding (4.3) follows using the 
Martingale property of M a and the initial condition V a (0) = 0. ■ 

The starting point in the analysis of continuous time branching processes is the so called 
Malthusian rate of growth parameter A > 0 which solves the equation 


Ae" At E (V a (t))dt = 1 


(4.4) 


Using Lemma 4.5 now implies 

A = 2 + a. (4.5) 

Let T\ be an exponential random variable with parameter A independent of V a and consider 
the integer valued random variable V a {T\). Note that (4.4) is equivalent to E (V a (T\)) = 1. 
Recall that D a is a random variable with the (non-change point) degree distribution (1.1). It 

is easy to check that D a — 1 = V a (T\). In particular for a 0, 

E(P Q (T A )log + P a ) < 00 . 


Using standard Jagers-Nerman stable age-distribution theory for branching processes [34135 
now implies the following. 
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Proposition 4.6. There exists an integrable a.s. positive random variable W a such that 

a.e.JL 1 


In particular 


e -(2+«)t| B p a ^| a -±^ W( 

1 




for a finite random variable W' a . 


2 + a 


log 


(4.6) 


We conclude this Section with asymptotics for the amount of “ continuous time” where the 
attachment dynamics using /3 is valid, namely r n — r 7ri . Recall the constant a from ( |1.5| ). We 
will also write {F n (t) : t ^ 0} for the natural filtration of the process (BP^t) : t ^ 0}. 


Lemma 4.7. Let T n = r n — r 7n denote the time after the change point in the continuous 
time embedding. Then 


Vn{ T n - a) 


1-7, 


2 + (3 V 7 

as n —> oo. Here Z is a standard normal random variable. 


Proof: Note that BPg(-) is a Markov process. Further for t ^ r 7n conditional on BP^f), the 
rate at which a new individual is born into the system is given by 

^■(t) '■= 22 (d v (t) + l + /3) 


^6BP s(t) 

= (2 + /3)|BPS(t)| -1, 


In particular 


r n = 


n—1 

E 

j=jn 


Ei 


(2 + P)j ~ 1 ’ 


(4.7) 


(4.8) 


where {Ei : i ^ 1} is a sequence of iid rate one exponential random variables. Using Lya¬ 
punov’s central limit theorem now completes the proof. ■ 

Using the distributional characterization in (4.8) and standard concentration inequalities 
for sums of independent random variables, one can show the following tail bound on Y n . We 
omit the proof. 


Lemma 4.8. For any k > 0 there exists N = N(k) < oo such that for all n > N(k), 

P (l T -" a l > ffljs) < 

In particular by Borel-Cantelli, P (|Y„ — a| ^ n^ 1 / 3 eventually ) = 1. 

Here the bound n -1 / 3 was arbitrary. An upper bound of n UV 2- * 5 ) with any <5 > 0 would 
result in identical result as above. We fix n^ 1 / 3 for definiteness. We end this Section by 
defining the Yule process. Properties of this process will be needed in the next few Sections. 


Definition 4.9 (Rate v Yule process). Fix v > 0. A rate v Yule process is a pure birth process 
{Y u (t) : t ^ 0} with Y,(0) = 1 and where the rate of birth of new individuals is proportional 
to size of the current population. More precisely 

P (Y v (t+) - Y v {t)\F(t)) := vY„(t)dt + o{dt ), 

where {F(t) : t ^ 0} is the natural filtration of the process. 

The following is a standard property of the Yule process, see e.g. |45, Section 2.5]. 
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Lemma 4.10. Fix time t > 0 and rate v > 0. Then the random variable Y v (t), namely the 
number of individuals in the population by time t has a Geometric distribution with parameter 
p = e~ vt namely 

P (Y v (t) = k) = e~ ut {l - e~ ut ) k ~\ k ^ 1 . 


4.2. Convergence of the degree distribution. In this Section we will prove Theorem 2.1 


Recall the description of the limit random variable Dg in Section 1.3 
deal with the random variable Dg ut := Dg — 1 
succinctly as: 


It will be easier to 
Then the distribution of Dg ut can be written 


(a) with probability 7 , Dg ut := Ybc where Ybc = D a — 1 + N® a [ 0 , a]; 

(b) with probability 1 — 7 , Dg ut = Yac where Yac := Aac and Xac is as defined in Section 

Ol 


Now recall that for any time t and vertex v born before time t, d v (t) denotes the number of 
children (out-degree) of vertex v at time t. For fixed k ^ 0 define 


N^ c (k) := ^ 1 {T v ^ r 7 „, d v (r n ) ^ k} , (4.9) 

ueBP e (r„) 


and 

N* C (k):= 1 {T v > r in , d v ( Jn ) > k} . (4.10) 

ii6BP g(r n ) 

In words, N^ c (k) are the number of vertices that were born before the change point and have 
out-degree at least k by time r n (thus in the tree Tg, n ) whilst N^ c (k) is defined analogously 
but for vertices born after the change point t t „. The following proposition is equivalent to 
Theorem O 


Proposition 4.11. Fix k ^ 0. Then we have 


N% C (k) 


N* c (k) 


n 


7 f (Ybc > k), 


n 


(1 — 7 ) P(Yac ^ k), 


(4.11) 


as n 


00 . 


The rest of this Section deals with proving this Proposition. 


4.2.1. Analysis of 1V^ C (-) : We start with the easier case. We will need some more notation. 
For fixed 0 ^ j. k, define lV® c (j : k) for the number of vertices that were born before the the 
change point r 7n with out-degree exactly j at time r 7n that end up with at least k children 
by time r n . Note that 

Nn C {j ■ k) = N n (k + 1, 7 n) 

j^k 


namely the number of vertices with total degree k + 1 (thus out-degree k) in the tree before 
change point T in ■ Recall that Lemma 4.4 the asymptotic degree distribution of T in is D a 


and thus the asymptotic out-degree distribution of the tree F /n is P° ut = D a — 1. Using the 
form of Ybc, it is thus enough to show for each fixed 0 ^ j ^ k. 


N* C U j fc) 

n 


7 nD°o 




(4.12) 


We start with the following simple Lemma. 
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Lemma 4.12. Fix 0 < p, q < 1, a sequence of non-negative integer valued random vari¬ 
ables {N n : n ^ 1} and a sequence {q n : n ^ 1} E [0,1]. Conditional on N n , let S n be a 
Binomial(N n ,q n ) random variable. Further suppose 

N n a.e. 

- >P, qn -4 q. 

n 

Then S n /n -^4 pq. 

Proof: We assume we work on a rich enough probability space where we can cou¬ 
ple {£„ : n ^ 1} with a sequence j,S n : n ^ lj where S n is Binomial(np, q n ) such that 

\S n — S'nl ^ |-/V n — np\. Standard exponential tail bounds for the Binomial distribution 
coupled with Borel Cantelli and the hypothesis of the Lemma imply that S n /n ^4 pq. Since 
\S n — S n \/n ^ \N n /n — p\, again using the hypothesis of the Lemma completes the proof. ■ 
We now proceed with the proof. Analogous to fV BC (j : k), for each s ^ 0 define Z BC ((j : 
k ), s ) for the number of vertices born before the change point r 7n such that at r 7 „, they have 
out-degree exactly j and further by time r 7n + s they have degree at least k. Then note that 
conditional on the information at time r 7 „, 


z n^({j '■ k ), s ) = Bin (N n (j + l,7n),P(P^ +1 [0,s] ^ k - j)) 

rBC(^ : k) = 2 bc 

Thus writing a+ = a + n 1/73 and a n = a — n 1//3 and using Lemma 

Zn C ((j '■ k )i a n) < Nn C ti : k ) < Zn C ((j : *),o+) eventually a.s. 


4.7 


(4.13) 


Further the random variables of interest 1 V bc (j : k) = Z^ c ((j '■ k),T n ) where T n is as in 
Lemma 


4.8 


(4.14) 


Using the Binomial convergence Lemma |4.12| and noting that by Lemma |4.4| and choice of 
°n i °n > the hypothesis of this Lemma are satisfied, implies that 

Z^((j : fc),Gn) 7 p (jD ° u t =j)PC pl+l [M >k -j), 


n 


where take a n as either a+ or a n . Now using (4.14) proves (4.12). This completes the analysis 
ofJV n BC (.). 


4.2.2. Analysis of N^ c (-) : We start by setting up some notation. Fix k ^ 0 and define the 
function 


g k {u) := P(V/3[0,u\ > k), ufz 0. (4.15) 

Here Vg is the offspring point process with attachment parameter fd. Then writing out the 
form of the distribution of Ya,c more explicitly (and using the definition of a from (1.5)), to 
prove the second assertion of (4.11), we want to show 


N£ C ( k ) 


n 


7/ (2 + P)e^ u g k (a-u)du. 


(4.16) 


For s ^ 0, define Z^ c (k,s) for the number of individuals born in the interval [r 7ri ,r 7n + s] 
such that by time t t „ + s, these vertices have at least k children. Then note that N^ c (k) = 
Z^ c (k, Y n ). Mimicking the proof of lV) BC (fc), it is enough to show that 


Z% c (k:,ar 


n 


7 / e (2+ ' 3) “fi , fc(« - u)du, 


(4.17) 
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where a n is either the sequence a~ = a n — n -1 / 3 or af = a n + n -1 / 3 . To ease notation we 
will just work with the sequence a n = a. The entire proof goes through by replacing a in the 
steps below by a n . 

We start with a few preliminary results. The first result describes strong concentration 
results of the growth of the number of individuals in BPg in the interval [r 7 „, r in + s]. Define 
the process 

5n(«) := \BPg(T-yn+u)\, 0 ^ u < a. (4.18) 

Proposition 4.13. There exists a constant C < oo such that for all n, 

P ( sup \3T n (u) — n'ye < ' 2+f3 ' >u \ > v/ n log n ) ^ ^ . 

\o J log n 


Proof: The plan is to use Doob’s L 2 -maximal inequality for continuous time Martingales 
(see e.g. [36, Chapter 1.9]). For this we will need to derive Martingales related to the process 
3? n (-). Throughout we will write {Tf : 0 ^ t ^ o} for the filtration {BP e(r^ n + t) : 0 ^ t ^ o}. 
Recall from the rate description in (4.7) that 2T n {-) is a pure birth process such for any t ^ 0, 
conditional on Tff , 3f n {t) 3f n (t) + 1 at rate (2 + f})3f n (t) — 1. Arguing as in the proof of 
Lemma 4.5 it is easy to check that the process 

e —(2+/3 )t _ i 


M\(t) := (V (2+/3)t iT n (t) -njj - 


2 + /3 

is a mean zero Martingale. This in particular gives that 

g—(2+/3)t _ ^ 


0 ^ t ^ a, 


(4.19) 


e -(2+^E(^ n (t)) = n 7 + 


2 + /3 


0 ^ t < a. 


(4.20) 


By Doob’s L 2 -maximal inequality applied to the process M\(-) we have for any A > 0, 


P 


sup 

.0 


g -(2 _ n7 ) - 


e -(2+/8)t _ X 


2 + P 


> A < 


E(M 2 (a)) 
A 2 ' 


(4.21) 


If we can show there exists a constant C < oo such that E(M 2 (a)) ^ Cn, using A = .5 y/n logn 
and algebraic manipulation of (4.21) completes the proof. So let us now derive this bound on 
E(M 2 (a)). 


First squaring the expression in (4.19), expanding and using (4.20) gives for t ^ 0, 

2 


E(M 2 (t)) = E (e-^+W2? n {t) - n 7 ) - 


e —(2 +p)t _ i 

24 ^ 


(4.22) 


Thus we need to understand the evolution of the process S ’ 2 (•). Again using the rate descrip¬ 
tion of 3? n , this process undergoes a change 

AiT n 2 (f) := ^ 2 (t+) - $f 2 (t) = (1 + 2 & n (t)), 


at rate (2 + f3)3f n (t) — 1. Using this one may check that the following process on [0, a] 

ft 0-2(2+/3)t 

M 2 (t) := e~ 2 ^^ 2 (t) - ^ e- 2 ^ s /33f n (s)ds - 2{2 + f} y (4-23) 

is also a Martingale. In particular since first moments are conserved, 

rt „-2(2+/3)t _ I 

E( e -2( 2 +^)^2( t) ) = n 2 7 2 + ^ p e -2(2+fi)s E( ^ n(a))da _ . (4.24) 
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Using (4.20) shows that there exists a constant C such that 

E(e- 2 ( 2+ffl f n 2 (t))-n 2 7 2 


£ 


tv y. 


(4.25) 


Expanding the first bracket in ( 4.22[ ), using ( |4.20| ) and (4.25) shows that E(M 2 (a)) ^ Cn for 
some constant C. This completes the proof. 


Now divide the interval [r 7n , T,, n + a] into an 1 / 3 intervals of length n 1//3 : 


r 7 n,r 7 n + ^ 1/3 


T'yn T 


1/3 


n 


; T"/n T 


n 


1/3 


T- 


an 1 / 3 — 1 
7 n T i D 


n 


1/3 


) 1771 


+ 


an 


1 / 3 ' 


n 


1/3 


of length n 1 / 3 . To ease notation, write the above collection as {X, : 0 ^ ^ an 1 / 3 — l}. 

Further let r” = r 7r , + i/n 1//3 with Tq = r 7n so that X,; = [r”, r^J. 

Now write Birth; for the collection of vertices that were born in interval X; (i.e. the collection 
of vertices v with birth times T v Eli) and write 

iT n (X t ) := |Birth;| = iT n (rf +1 ) - iT n (rf +1 ) , 

for the number of individuals born in this interval. Then the following is an easy corollary of 


Proposition 4.13 


Corollary 4.14. We have 

' an 1 / 3 — ! 

n 


p 


i=0 


(2+W 


^ l (X i )-(2 + /3) 7 n 2 / 3 e W 3 


< 2 ^/n logn 


as n —> 00 . 

For future reference write for the event above namely 

an 1 / 3 — ! 


n 

i=0 


(■2+P)i 


% a(X:) - (2 + /5) 7 n 2 / 3 e W 3 


< 2 \Jn log n 


(4.26) 


Now for each interval X,. we will partition the vertices born in this interval into two classes: 

(a) The collection of good vertices Qf This consists of all v E Birth* such that they produce no 
children by the end of the interval i.e. vertices v with T v E [r 7n + i/n 1//3 ,r 7n + (i + l)/n 1 / 3 ] 
such that by time t 7 „ + (i + l)/n 1//3 , vertex v still has no children. Note that since the 
intervals are of time length n -1 / 3 , one expects a large proportion of vertices born in the 
interval X; to be good. Write 5^ ood (Xj) = \Q,\ for the number of good vertices in X,;. 

(b) The collection of bad vertices B, := Birth,; \ Qi, the collection of vertices born in X; which 
produce at least one child by time r 7n + i/n 1 / 3 . Write i2^ bad (Xj) = |£>,| for the number of 
such bad vertices in X,. Write 


an 1 / 3 — ! 

rjy^bad . ^ (jy ?bad 

* / j 

i =0 


(£) 


for the total number of bad vertices. 

Fix a constant C and define the event Bf = {/2y£ ad (Xj) ^ Cn 1 / 3 logn}. These events depend 
on C but we suppress this in the notation. 


1 /3 

Proposition 4.15. We can choose constant C < 00 large such that P(U“”i B- 1 ') 
n — > 00 . In particular for the total number of bad vertices we have «2^ ad = Op(n 2//3 logn). 


0 as 
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Proof: Fix an interval X,;. Note that every bad vertex is one of two types: 

(a) A vertex that is a direct child of a vertex born before this time interval. Write X> bad 
for these direct bad vertices and write li? bad (Xj) = |X> bad | for the number of such vertices. 
Further write lF bad (Xj) for the total number of descendants of direct bad vertices born in 
the interval X,; (including the direct bad vertices). 

(b) A vertex that is bad and is a child of a vertex born in X,;. Thus the parent of this vertex 
is necessarily bad. 

Thus in particular we have that «2^ bad (Xj) ^ ^ bad (Xj). Now note that direct bad vertices in 
X> bad are created via the following steps: 

(i) A descendant (maybe good or bad) of a vertex born before X* is born into the system. 

The number of such individuals S% n (X,) ^ 3f n (Ii), the total number of individuals born 
in the interval X,;. Using Corollary |4.14[ there exists a constant C such that whp as 

n —> oo, for all the intervals 0 ^ i ^ an 1 ' 3 — 1, M n (Xi) ^ Cn 2 / 3 . 

(ii) Conditional on all these descendants of vertices born before X,;, such a descendant has 

to give birth to one individual in the interval [i/n 1 / 3 , (i + 1 )/n 1 / 3 ]. Recall that the time 

to give birth to the first child is an exponential random variable E\ with rate (2 + /3). 

Thus the probability of birthing this first child is bounded by 

p n = T(E 1 ^n- 1 ^)^ 2 ±l. 

Further by construction none of these vertices can have a parent child relationship and 
thus their offspring lineages evolve independently. 

In particular, conditional on all descendants of vertices born before time interval X,;, 


r>bad ( 

'n 


d (Xj) ^ st Bin (& n (li),p n ) (4.27) 

Here st denotes stochastic domination. Thus using Corollary 4.14 (4.27) and standard tail 
bounds for the Binomial distribution implies that there exists a constant C < oo such that 

P(^ bad (X*) ^ Cn 1/3 log n VO z ^ an 1/3 - 1) -A 1, (4.28) 


as n 


oo. 


Let us now complete the analysis of ^ bad (Xj). Let us start with the evolution of descen¬ 
dants of a single bad direct vertex after it gives birth to its child. This process then starts 
reproducing at rate 2 + /3 + l + /3 = 3 + /3. Further whenever a new vertex is added to the 
system, the rate of production increases by at most 2 + f3. Thus writing K = [3 + /?J and 
v = 2 + /J, the numbe r of descendants of such a bad vertex can be bounded by a rate v Yule 
process (see Definition 4.9) that starts with K individuals at time zero. Write { Yj < (t) : t ^ 0} 
for such a process. Thus the number of descendants of such a bad vertex in the time interval 
+ z/n 1 / 3 , r in + (z + 1 )/n 1//3 ] can be stochastically bounded by Yff (zr^ 1 / 3 ). In particular, 


• 771 


conditional on X)( ad (Xj), 


7>bad 

'n,* 


^ ad (X) 

(XKst £ ^(n" 1/3 ). 


(4.29) 


3 = 1 


Here {y^(-) : j ^ 1 j are an iid collection of Yule processes with distribution (■). Using 

the explicit distribution of the Yule process at a fixed time (Lemma 4.10), it is easy to check 
that given constant C > 0 we can find A > 0 such that 

P (^ bad (X0 ^ 10AV7n 1//3 logn ^ bad (^) < Cn 1/3 logn) ^ exp(-An 1/3 ). (4.30) 
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Using this exponential bound with (4.28) completes the proof. 

We now proceed with the proof of (4.17). For 0 ^ i ^ an 1 / 3 - 1 , let Z^ ood (k, a : Zi) be the 


number of good vertices in Birth,; which have at least k children by time a. Then note that 
conditional on BPq(t™ +1 ), 

i + r 


Zr\k, a : Zi) = Bin ( .^(1,;), 


9k a 


n 


1/3 


(4.31) 


Define the events 
G? := 


( , (2+/3)i / 1 \ 

:= | Zr d (k, a:Zi)- 7 (2 + /3)n 2 / 3 e D/3 g k ^ _ — J 


< 


Cn 1//3 logn 


Proposition 4.16. There exists a constant C < oo such that P (f)-» 1 as n —> oo 


Proof: Note that /£)f ood (Zj) = £tf n (Zi) — J^ ad (Zj). Combining Corollary 4.14 with Proposition 


4.15 implies that 


P/3-1 


* n 


i=0 


(2+/3)i 

^ n good (Xj) - (2 + /3)7n 2 / 3 e D/ 3 


< 3^logn| J ->• 1 , 


Now using the distributional identity (4.31) and standard tail bounds for the Binomial dis¬ 
tribution completes the proof. 


We are finally in a position to complete the proof of (4.17). First note that 

an 1 / 3 —1 an 1 / 3 —1 


£ Zr\k,a:Zi) ^ Z* c (k,a) < £ Z g ° od (fc, a : £) + ^ 


bad 
Tl ' 


i =0 

Using Proposition 


i=0 


4.15 


n _1 


n , 0. Using Proposition 

an 1 / 3 \ . ^,\ an 1 / 3 -! 


4.16 


E“=i Z^ od (k, a : Zi) 7 (2 + /?) 


(2+/3)i 


n 


n 


1/3 


^ e D/ 3 fffc ( a 


*=o 


i + 1 
W 


n 


7 (2 + /?) / e^ + ^ u g k {a-u)du. 
Jo 


(4.32) 


This completes the proof of (4.11) and thus the assertion of the convergence of the degree 


distribution of the model to the asserted limit in Theorem 12.1 


We conclude this Section with a related result regarding the evolution of the degree distri¬ 
bution. This follows by di rectl y modifying the proof above. Recall the definitions of N n (k , m) 
and N n (k,t ) from Section 


2.2 


For future use define for each k > 1 and 0 < t sj 1 


N n ,^(k, m) = YN n (j, m), N n ^(k,t) = y ^N n (j,t) 


E 


(4.33) 


namely the number of vertices with degree at least k respectively at discrete time m and at 
time t when we rescale time by n. Write q>(k,t) = N n ^(k,t)/n. Note that since we divide 
by n and not nt in this expression we have E/Ei Q>\k, t) = t. Now note that by Lemma 
we have for each fixed 0 < t ^ 7 , 


4.4 
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where p a {k) as in (1.1) is the limiting degree distribution with no change point. For 7 ^ t 75 1, 
analogous to the definition of a in 0 define 


a(t) : = 


1 1 1 

^ log -f 


(4.35) 


Analogous to the definition of Dg in Section 1.3 define Dg(t ) by replacing a by a(t) throughout 
the construction. Thus Dg = Dg{ 1). Let 

p ( °°\k,t) := P (Dg(t) = k), k ^ 1, 7 ^ t ^ 1. (4.36) 

Let p^°\k,t) = P (Dg(t) ^ k ). For 0 75 t 75 1, let q < ^ a) (k, t) = tp^°\k,t). 

Proposition 4.17. For all k ^ 1 we have 

=(")(u +\ n -M p 


sup \q£’(k,t) - q^’{k,t)\ 
07t71 


0 , 


as n —> 00. 

Proof: For fixed t ^ 7 , define the stopping time 

Tin = inf {s : | BPg(s)| = tn} , 

namely the first time that the continuous time embedding reaches size tn. Note that at this 
time, the corresponding tree has distribution Ttn ■ Write T n (f) = Tt n ~ T in for the amount of 
(continuous) time it takes for the process to reach this size after the change point. Then note 
that by Proposition |4.13 we can choose an appropriate constant C < 00 such that 


P sup |T(f) — a(f)| ^ C 
\77f71 


log n 


n 


L 


(4.37) 


as n —> 00 , where a(t ) is as defined in ( 4.35[ ). Repeating the above proof for the convergence 
of degree distribution and replacing a by o(t) throughout the argument shows that for each 

t ^ 7 N n ^(k,t)/nt —> P (Dg(t) ^ k ). Combining this with (4.34) implies that we have 
pointwise convergence q>\k,t) —> q^°\k,t). Now note that for each fixed n, the function 
q>\k,-) is non-decreasing on [0,1] while the limit function is also monotonically increasing 
and continuous (and thus uniformly continuous). Given e > 0, fix 5 > 0 such that for any 
t, s G [ 0 , 1 ] with |t — s| < 6, 

l?7(M) - «7(M)I < |. 

Divide [0,1] into intervals {[i<5, (i + 1)<5]} for 1 75 i ^ 1/5 of length 5. Via the pointwise 
convergence above, get no < 00 large such that for all n > n 0 


P ( sup \q^\k, id) - q^°\k,id)\ < - ) ^ 1 — e. 


77*7 J 


(4.38) 


Write G n (e , 6) for the event in the above equation. Then on this event, by the choice of 5, for 
all* we have \q>\k,id) — (k, {i + 1)<5)| 75 e/2. Using monotonicity, for any t G [id, (i + 1)5], 

\q^\k,id) — q^\k,t)\ 75 e/2. By the triangle inequality on G n (e,d), for all t € [0,1] and 
n > no, 

I q>\k,t) - q^\k,t)\ ^ \q^\k,t) - q ( T\k,id)\ + \q^\k,iS) - q < C ) {k,id)\ 


+ | q^\k,id) - q^°\k,t)\ ^7 + 7 + 7 = e. 




Since no is independent of f, this completes the proof. 
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4.3. Proof of the tail exponent for the limiting degree distribution. The aim of this 


Section is to prove the asserted tail bound, namely (2.1). First note that the lower tail bound 
is obvious since with probability 7 , Dg stochastically dominates D a and by (1.2), D a has the 


asserted tail behavior. The main crux is then proving the upper bound, namely 

P {D e c'/x 2+a . 


(4.39) 


Recall Definition 4.9 of the Yule process and in particular Lemma 4.10 on finite time marginal 
distribution of the Yule process. 

Now note that in the description of the limit random variable Dg, with probability 1 — 7 , 
D g = Y s [0, Age] ^sst lVg[0, a] where as before represents stochastic domination. Now 
define 

v = 2 + fi, K = [1 + (4.40) 

Let Y u k be a rate u Yule process started with K individuals at time zero. Comparing the rate 
of production of ne w ind ividuals in the point process Vp with , we get that JV]g[0, a] ^ st 
Yj i (a). By Lemma 4.10, Yj i (a) is the sum of K independent Geometric random variables. 


Using the fact that a geometric random variable has finite moment generating function in a 
neighborhood of zero and an elementary Chernoff bound implies that there exist constants 
k, td > 0 such that for all x ^ 1 , we have an exponential tail bound, 

P(Y]a[0, Age] > x) d, P(Y 1 / A (a) > x) ^ k' exp(— kx), (4.41) 


Thus when with probability 1 — 7 Dg = Np [0, Age] then the corresponding random variable 
has exponential tail. Thus the main contribution to the tail arises when with probability 7 , 
Dg = D a + Np a [ 0,a]. Arguing as above (and assuming /3 ^ 1), conditional on D a = k, we 
have 

k 

Nj? a [0,a] St s tJ>; (i) ’ K (a), 

3 =1 

where j Y^ ,K {■) : j ^ l| are a collection of independent rate u Yule processes each started 
at time zero with K individuals and independent of D a . The following elementary Lemma 
completes the proof. 


Lemma 4.18. Let D ^ 1 be non-negative integer valued random variable with P (D ^ x) ^ 
c/x 7 for all x^l, for two constants c, 7 > 0. Let {Y t : i ^ 1} be a sequence of independent 
and identically distributed positive integer valued random variables, independent of D. Con¬ 
sider the random variable D* := ff Y\ has finite moment generating function in a 

neighborhood of zero then there exists a constant d > 0 such that for all x 1 , 

P (D* ^x)^ d/x 7 . 

Proof: For the rest of the proof, write /i = E(Y]) < 00 . Then note that 

x 

2 M 3 

P {D* fix)^ ^P(D = j)P(E y * 

j =1 i =1 

X 

2/j. 

i= 1 

where the second equation follows using the fact that 1) ^ 1 for all i and the tail bound for D 
from the hypothesis of the Lemma. To complete the proof, note that standard large deviation 


Ss 1 ) + P ( D > 2L 
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bounds imply (since Y % has a finite moment generating function about zero) imply that there 
exist constants k, k' such for all large x 


This completes the proof. 


2fj, 


P | Yj ^ X I ^ K / exp( — kx). 


2=1 


The only item left to complete the proof of Theorem |2.1| is to show that the change point 
does change the degree distribution from the original (no change point) model. In Section 
[5] we will carry out a detailed analysis of the density of leaves which in particular will show 
that the asymptotic density of leaves pg( 1 ) ^ p a (l). 


4.4. Analysis of the maximal degree. The aim of this Section is to prove Theorem |2.2[ 
For simplicity and to ease notation, we will deal with k = 1 namely just the maximal degree. 


The general case follows in an identical fashion. First note that using (1.3), writing M 7 „( 1) 
for the maximal degree of a vertex in %, n namely in the tree just before the change point 
implies that M 7 n (l)/n 1 ^ 2+ “^ converges weakly to a strictly positive random variable. Since 
M n ( 1) ^ Mj n ( 1), this implies that given any e > 0, there exists a constant K' e > 0 such that 

liminf P (> K'e) > 1 - £■ 

n-> 00 


Thus to complete the proof of theorem 2.2 we need to show, given any e > 0, < 00 such 

that 


li “ s “ pP (j7^) <K ' 


>l-£. 


(4.42) 


For any vertex v £ [n] time point m £ [n], write deg(u,m) for the degree of vertex v in 
T m with the obvious convention that deg (v,k) = 0 if k < v. Then note that M n (l) = 
max(M pre (n), M post (n)) where 

Mp re (n) := max deg (v,n), M post (n) := max deg(u,n). (4.43) 

p£ [ 1 , 727 ] u£ [ 217 + 1 , n] 

Let us first analyze the maximal degree of vertices that appeared after the change point. 


Recall the constant a from (R.5| and u, K from (4.40). 


Lemma 4.19. We have P(M post (n) > 2 Ke u ^ a+l ' > logn) — > 0 as n —>• 00 . 

Proof: We will assume (5 ^ 1 below. Else replace (5 with one in the rest of the argument below. 
For simplicity write k n = 2 Ke l '( a+1 " > logn. Recall that in the continuous time embedding, T v 
represents the time of birth of vertex v and further for v £ [ yn + l,n], each such vertex is 
equipped with a offspring point process Vp. As in Section 4.3 1 + Vp ^ s t Y^ where Yj ( is a 
rate u Yule process started with K individuals at time zero' INow note that via our continuous 
time embedding, 

M post (n) := max (1 + P|(0 ,r n -T v )), 
v £ [721+1,71] 

since by time r n , a vertex born after the change time has been alive for T n —T v ^ r n —r 7n := T n 
units of time. Now 

P(M post (n) > k n ) ^ P(M post (n) > k n , Y n < a + 1) + P(T„ > a + 1), 

^ P( max (1 + Vp(0, a + 1)) > k n ) + P(T n > a + 1). 

^£[772+1,71] 


(4.44) 
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Using Lemma 4.7 we have limsup,^^ P(Y n > a + 1) = 0. Let {Y„ v : v E [ 7 n + 1, n]} be 


family of independent rate u Yule processes started with K individuals at time zero. Using 
Lemma 4.10 a simple union bound and the choice of k n implies P(max i;g [ 7n+lin ] Yjf(a + 1) > 
K) -A 0 ~ ' ■ 

Thus the above Lemma implies that the maximal degree amongst vertices that arrive after 
the change point is Op(logn). To complete the proof of (4.42), it is enough to show that 
(4.42) holds with M n ( 1) replaced by M pre (l). Thus fix e E (0,1). Using Proposition 4.6 fix 
A = A e such that 


lim sup P (r 7n — 


1 


2 + a 


log yn > A) ^ e/2. 


(4.45) 


Now consider the following process BPg*: 

(a) Run the process BP Q till time t n (A) := log 771 + A. 

(b) At this time: all vertices in BP a (t ri ) switch to the dynamics with parameter f3 namely 
each vertex now reproduces at rate proportional to its out-degree + 1 + /3. 

(c) Run this process for an additional a + 1 units of time where a is as in (1.5). 

Abusing notation, let M* TeA (l) denote the maximal degree by time t n + a + 1 of all vertices 
born before time t n . We can obviously couple the original process BPg and BPg + such that 

the set |r 7n - ^ log jn ^ A, T n ^ a + 1 j we have M pre (l) ^ M* re A (l). 


on 


Further note that for any fixed K we have 

(1) > A'n 1/(2+Q) ) ^P ^M pre (l) > iLn 1 /( 2+a ), T r 


P M, 


< a + l,r 7 „ < 


1 


4- P(Y n > a + 1) + P(r 7n > 


1 

2 + a 


2 + a 
log 7 n + A). 


log 7 n + A 


First choosing A appropriately as in (4.45) and using Lemma 4.7 we get that for any fixed K, 
limsupP(M pre (l) > Kn 1 /( 2+a )) ^ limsupP(AL pre ^(l) > Kn 1 /( 2+a )) + e/2. 


The following Lemma completes the proof of (4.42). 

Lemma 4.20. Fix A > 0. Given any e > 0, we can choose K = K(A,e) < 00 such that 

lim sup P {M* ie A (1) > Kn V( 2+Q )) ^ e. 

n—> 00 

Proof: First note that till time t n (A), the process BPg* is a the continuous time version 
of a (non-change point) preferential attachment model with attachment parameter a. This 
continuous time embedding was used to derive asymptotics for the maximal degree in |6|[7j. 
In particular the bounds derived in these papers imply the following for a fixed A: Write 
M n ( 1) for the maximal degree exactly at time t n (A). Then there exists L = L(A,e ) < 00 
such that 

limsupP(M n (l) > Ln 1//( ' 2+a ' > ) ^ e/2. (4.46) 

n—>00 

Now note that on the event |M n (l) ^ Ln 1 ^‘ 2+ot ' > | at time t n + a + 1, the degree of every 
fixed vertex in the system is stochastically dominated by a rate v Yule process started with 
LnfY 2+q ) vertices at time zero and run for time a + 1 where v is as in (4.40). Write D n for 
such a random variable and note that by the description of the dynamics of the Yule process 
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and Lemma 4.10, we have that 


Ln i/(2+a) 

Dn= J2 Y *A a + 1 )> 

i=i 


(4.47) 


where {T/j(a + 1) : j ^ 1} are iid Geometric random variables with p = e u ( a + l ). Further 

7 such 

(4.48) 


note that using Proposition 4.6 on the size of the branching process, we can choose C such 
that 


limsupP(|BP 0 jJr (f n )| > Cn ) ^ e/2. 


Thus on the “good” event 


Qn ■■= BP 


we have that 


yt n )KCn,¥ n (l)an 1/(2+a) }, 


^pre,A(l) ^st -D n A4 n 

17i>7Gn 


where { D% : v ^ 1} is an iid sequence with distribution (4.47). Note that E(Y/^a + 1)) = 
e u ( a +i)' K := 10 Le u< ' a+1 \ Then standard large deviations for the Geometric distribution 
implies that there exists a constant C' > 0 such that for all n ^ 1 

P (D n ^ JLn 1/(1+Q) ) < exp(-C'n 1/(1+Q) ). 

Thus by the union bound, 

P (M n > Kn V( 1+a )) s; Cnexp(-C'n 1/(1+a) ) -A 0, (4.49) 

as n —> oo. Thus we have, 

limsupP(M* re)j 4 (l) > Kn 1 /l 2 +“)) ^ lirnsupP(t//) + limsupP(A4 n > Kn 1 ^ 2+a ^) ^ e, 


using (4.46), (4.48) and (4.49). This completes the proof of the Lemma and thus the analysis 
of the maximal degree asymptotics. 


5. Analysis of the proportion of leaves 


The aim of this Section is to prove Theorem 2.3 In the next section we will use the 


proportion of leaves (degree one vertices) to construct consistent estimators of the change point 
7 . We start in Section [5.1| by deriving strong error bounds between the expected proportion 


of leaves and the asserted limits in (2.3). Then in Section 5.2 we complete the proof of the 
functional central limit theorem. We start with some preliminary notation. For the rest of 
the proof, to ease notation, we will write N n (m ) := N n (1, m) for the number of leaves in T m 
and let N n (t ) = N n (nt). Recall the asserted limiting proportion {p^ : 0 ^ t ^ 1} from (2.3). 

For each n ^ 2 define the collection of real numbers w n = {w m : 2 ^ m ^ n — 1} 
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5.1. Expectation error bounds. The following Proposition is the main result of this Sec¬ 
tion. 


Proposition 5.1. There exists a constant C < oo independent of n such that the expectations 
satisfy 


sup sup 


E(N n (t)) - ntp ^ 




(5.2) 


Remark 8. Note that by Proposition |4 . 1 ?] we know there exists a function p (oo) (0, •) such that 
p ( 7 l) (0,i) —> p (oo) (0, t) for 0 < t ^ 1. By the Bounded convergence Theorem, E(p (n) (0, t)) —> 
p (oo) (0,t). Thus the above Proposition implies that p (oo) (0, t) = p[ oc) . In particular it shows 
that the degree distribution owing to the change point is different from the degree distribution 
without change point. This is the final nail in proving Theorem |2.1| 


Remark 9. A similar result was shown in the context of no change point in 58, Section 8 . 6 ] 
and 


1261 (not just for leaves but for all fixed k ^ 1). Our proof uses slightly different ideas 
starting from the same point as in 58 . While we do not consider higher degree vertices, as 
in 58 , the result above can be used as a building block to show identical error bounds for 


expectations of the number of higher degree vertices about limit constants. 


Proof: To ease notation write d n (m) = E(N n (m)). The main crux of the proof is studying 
a recursion relation for ! d n (m + 1) in terms of t? n (m). We will give a careful analysis of the 
time period before the change point and then describe how the same ideas give the result for 
after the change point. 

For each 1 < m ^ n write £ m +i for the event that vertex m + 1 connects to a leaf vertex 
in T m . Then note that conditioning on Tm, when m < ny we have 


E {N n (m + 1)|7^) 


N n (m ) + 1 - P(£ m+ i|7^i) 


N n (m) + 1 


(1 + a)N n (m) 
(2 + a)m — 1 


(5.3) 


When m ^ we have the same recursion as above but with a replaced by /?. Taking full 
expectations and simplifying gives the following recursion: 


N n (m + !) = ! + w m N n (m ), 'd n (m + !) = ! + w m 'd n {m), 


(5.4) 


where {iv m ■ 2 ^ m ^ n} are as defined in (5.1). 

Before the change point: Repeatedly using this recursion and using the boundary condition 
i? n ( 2 ) = 1 gives for m + 1 ^ ny, 


Mm + 1 ) = J2YI ( 1 


(1 + a) 

(2 + a)k - 1 


s=2 k=s 

Now fix so ^ 1 large enough such that the following three conditions hold: 
(i) For all k ^ sq 


' 1 1 
log k + 7 ^ ^2 ~ ^ ( lo g k + 7) + t- 


1=1 


Here 7 is the Euler-Mascheroni constant. See [ 8 ]. 
(ii) For all k ^ s 0 , 1 - ^ 2 +a)k-i ^ 1 / 2 - 


(5.5) 
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(iii) We may choose a constant C < oo such that for all k ^ 1, 


E 

i=k 


c 


((2 + a)k — l) 2 k 


(5.6) 


Further there exists a constant C' such that for all s > sq, | exp(C/s) — 1| ^ C'/s and 


1 - 


(l + «) 


( 1 +Q) 


— g ( 2 +a)s —1 




a 


(2 + a)s — 1 / 

To ease notation, for the rest of the proof let 5 = (1 + a)/(2 + a). Using the elementary 
inequality 1 — x ^ e~ x for x E (0,1) and the choice of sq above, the following inequalities with 
a constant C = C(so,a) < oo are readily verified: 

(A) For all m ^ s ^ sq, 


E m 6 
^ i=s i — 


771 / 


„<5—1 


^ c- 


m 


5 ‘ 


(B) For all m ^ s ^ so, 


(C) For all m ^ s ^ sq, 


E m 

i=, 


m ( 1 +a) _ ( 1 +q) 

=s ( 2 +q)i g 2 ^i= s ( 2 +q)i —1 


„<5-i 




m 


s ' 


n > 


k=s 


(1 + Oi ) 

(2 + a)fc- 1 


m 


(5.7) 


(5.8) 


(5.9) 


Now note that by the “Lindeberg” trick, for any s ^ m and two collections of non-negative 
numbers {wk : s ^ k ^ ttt,} and { Zk : s ^ k ^ 777 ,} we have 


J^[ wife - JJ 


Using this with = 1 — ( 2 +a)fc-i an d Zk = e (2+a)k ~ 1 and using (5.7), (5.8) and (5.9) gives 
the following Lemma. 


^ ^ | w k - z k | II Wfc 

k=s s^l<k l>k 

(i+«) 


(5.10) 


Lemma 5.2. Fix so as above. Writing 5 = (1 + a)/(2 + a) there exists a constant C < 00 
such that for all m ^ s ^ sq, 


n 


(! + «) 

(2 + a)k — 1 





s 


< 5-1 


m s 


Now using the form of the expectation 7 9 n (m) in (5.5), the error bound in the above Lemma 
and the integral comparison 


1 

m s 



III r- 

X 6 dx ^ V ( — ) 
^ \m/ 


rm +1 


^ —j / x { dx, 


vrr 


' sq +2 


shows that there exists a constant C such that for m ^ 777 

Wn(m) - ™ | ^ C. (5.11) 

This is the assertion for the expected number of leaves before the change point. 

After the change point: We now describe the evolution of i? n (m) for n'y < m ^ n. We 
only give the basic idea as the details are the same as before the change point. First note 
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that by the above analysis, there exists a constant C such that |# n (^ 7 ) — n'y/S | ^ C. Now 
the evolution of the process after 771 is as in (5.3) with a replaced by f3. Thus starting at 
m > ny and using the argument above we get 


+ 1) := y II ( 1 

s=ri7+l j=S 


1 + /? 


(2 + fi)j - 


Y ) + M™ 7) n 


j=n7 


1 - 


1 + /3 


(2 + P)j - 1 


(5.12) 


Simplifying notation and writing m = nt where 7 ^ t 1 and repeating the arguments above 
it is easy to check that there exists a constant C independent of n such that 


- ntpl °° ) | ^ C, 

where is as in (2.3). This completes the proof. 


(5.13) 


5.2. Proof of Theorem 2.3[ A central limit theorem for the number of leaves N n (n ) (in 
fact all degree counts N n (k,n )) at time n in the setting of no change point was established 
in 148]. We will extend this to a functional central limit theorem in the change point setting. 


First recall the function 5 a from (|2.4|). Define the stochastic process 

if t < 7 


K(t) = 


s (N n (nt) - ■& n {nt)) 


7 


n 

(A r n(nt) - d n (nt )) 


(5.14) 


n 


if t ^ 7 


Recall the process M(-) in (2.8) and the relationship between M and G. Using Proposition 
5.1|and the continuous mapping theorem, it is enough to show the following result. 


Proposition 5.3. We have M*(-) -+ M(-) on D[0,1] as n -> 00 . 

Proof: The main idea is to study Martingales associated with the {N n (m) : 2 ^ m ^ n} and 
then use the Martingale Functional central limit theorem. There are an enormous number 
of variants of such functional limit theorems under a multitude of conditions. We quote the 
specific form relevant to this setting. Recall the function <+) and the corresponding diffusion 
M(-) defined in (2.9). 


Theorem 5.4. /]25,29]/ For each n ^ 1, let {M n (m ) : 1 ^ m ^ n} be a mean zero Mar¬ 
tingale with finite second moments adapted to a filtration { T n (m ) : 1 ^ m ^ n}. Write 
{X n (m) : 1 ^ m ^ n} for the associated Martingale difference sequence namely X n (m) = 
M n (rn) — M n (m — 1) with M n ( 0) = 0. Assume the following two hypothesis: 

(i) For each 0 ^ t ^ 1 

nt 


Vn{nt) := Y E ([^n(?+] 2 x n (m - 1)) -A- fit), 


as n 


00 . 


m= 1 

(ii) For each fixed e > 0 


£ 

m+n 


E 


[X n {m)] 2 1 {|X n (m)| > e} F n {m - 1)) -A- 0. 
Then defining the process M n (t ) := M n (nt), one has M n M in D[0,1]. 


(5.15) 


(5.16) 
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For our example (following [48]) define the process 

N n (m) - d n (m) 


K(m) = 


n m—i 
j =2 W 3 


2 < m ^ n. 


(5.17) 


Here Wj is as in (5.1). Using the recursion (5.4) results in the following Lemma. 


Lemma 5.5. The process N* is a martingale with respect to the filtration generated by 
{T m : 2 ^ m ^ n}. 

Now define the corresponding Martingale differences d n {m) = N*(m) — N*(m — 1). Define 


A n (m) = 1 {m + 1 connects to a non-leaf vertex in 7^i_i}. Then simple algebra and (5.4) 
implies that for m ^ 717 

(1 + a)N n (m - 1) 


d n (m ) = 


1 

n m —1 
j =2 W 3 


A n {m) + N n (m - 1) 


and 


E(A n (m)|7^_i) = 1 - 


(2 + a)(m- 1 ) - 1 

(1 + a)N n (m - 1) 

(2 + a){m - 1 ) - 1 


- 1 


(5.18) 


(5.19) 


For m ^ 717 we have identical formulae as (5.18) and (5.19) but now a is replaced by fi. For 


the rest of the argument we will replace the denominator for the second term ( 2 +a)(m— 1 ) — 1 
by (2 + a)(m — 1) — 1. It is easy to check that the error is negligible and will ease presentation. 


Now use Proposition 4.17 which allows us to uniformly approximate N n (m — l)/(m — 1) 


by p^7/r>• Further the asymptotics of Yl'jL 2 w j derived in the previous Section implies that 


m/n’ * — — -L -Lj = 

for m ^ 717 , nr =2 w j ~ while for rn > 777 , Yl'-Lz w j ~ ( 717 ) ~ 5a (m/n^)~ S P where 5 a ,5p 

as as defined in \2A\ . Taking conditional expectations in ( |5.18| ), using ( |5.19 ) and using the 
above approximations results in 


E([d n (m)] 2 |7^_i) ~ < 


m 


25a 






m/n } 


Now consider the Martingale 
M n (m ) := 


m/n ' 


1 N n (m) - d n (m) 


J m/n / 


if m ^ 77.7 ? 


if rn ^ 777 


(5.20) 


n 


5a + l/2 


n m —1 
j =2 W 3 


2 < m ^ n. 


(5.21) 


We will apply Theorem 5.4 to this Martingale. Let {A n ( 77 i) : 2 ^ m ^ 77} denote the corre¬ 
sponding Martingale differences. First fix t ^ 7 and recall the definition of the cumulative 
conditional variance V n (nt ) till time t in ( |5. 15 ). Using the first expression in (5.20) we get 

nt 


V n {nt) 


1 


2<5q; + 1 


n 


T,A"fi P i:; n [i-s aP ^ n ) 

3 =1 


s 2<5 “ [^^(i - <w£ x>) )] ds = 4>{t), 


as n -A 00 . Thus (5.15) is satisfied for t ^ 7 . A similar calculation now incorporating 


the second expression in (5.20) implies that (5.15) is satisfied for all t £ [0,1] with as 
in in (2.9). Now let us check the second condition namely (5.16). Note that for m ^ 777, 


X n (m) ^ £ implies that 3 m Sa ^ en Sa+1 ^ 2 . For large n this is impossible for all m ^ 77.7. 


A similar calculation for m > 777 completes the proof of (5.16). Using Theorem 5.4 we get 
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that M n {n-) M(-) in D[ 0,1]. Using the asymptotics for YIJL 2 w j derived in Section 


(5.21) now completes the proof of Proposition 5.3 and thus Theorem 2.3 


5.1 


m 


6. Consistency of the estimator 


The aim of this Section is to prove Theorem 2.4 Fix a truncation level e > 0 from zero as 
in the Theorem. Recall the time averaged proportion of leaves before and after each time t 


namely (2.15) and (2.14). Also recall the expression for the limiting proportion of leaves from 


(2.3). For any fixed interval [s,t] C [0,1], define H[s,t] by 


H[s,t] : = 


t — s 


f 

J S 


P, 


i^du. 


( 6 . 1 ) 


The interpretation is as follows: the above gives the expected proportion of leaves in the large 
network limit if one were to sample a time point U G [s,t] uniformly at random. Now define 
the two functions t/i (oo) and /r( oo) via: 

(a) Case 1: For e ^ t ^ 7 


th (oo) := p$°\ ^ ■= IrA#* + 1 ] 


1 - r 


(b) Case 2: For t > 7 


th := -- p 7 + f -— 


t — £ 


t — £ 


Define analogously to (2.16) the function 


D(t) := (l-t)\ t h {oo) -h (oo) 


t I » 


Routine algebra shows that 


D(t) : = 


(1-7)W" ) -^[7.1]| 


1 -t 

h t :=H([t, 1]). 

t G [e, 1 ]. 

for e ^ t ^ 7 . 
for t > 7 . 


( 6 . 2 ) 


(6.3) 


(l-e)\H[e,t]-H[e,l]\ 

Using the form of the limit proportion p[ oc) from (2.3) the following result is easy to check. 


Lemma 6.1. Fix £ < 7 and assume a / /?. Then D{-) is a continuous function on [e, 1] 
such that D(-) is constant on the interval [£, 7 ] and then is strictly monotonically decreasing 
on the interval [ 7 , 1 ] with D(t) — > 0 as t —> 1. Further the function has a strictly negative 
right derivative at 7 namely 


d +£>( 7 ) := lim < 0 . 

+ KU 47 t- 7 


(6.4) 


Now Theorem 2.3 immediately results in the following result. 
Lemma 6.2. Fix £ > 0. Then 

1 


sup | D n (t) - D(t )| =O p [—= 

t£[e, 1] Wn 


Now combining Lemmas 6.1 and 6.2 completes the proof. 
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